Large Matrix Multiplication on a Novel Heterogeneous Parallel DSP Architecture

نویسندگان

  • Joar Sohl
  • Jian Wang
  • Dake Liu
چکیده

This paper introduces a novel master-multi-SIMD on-chip multi-core architecture for embedded signal processing. The parallel architecture and its memory subsystem are described in this paper. We evaluate the large size matrix multiplication performance on this parallel architecture and compare it with a SIMD-extended data parallel architecture. We also examine how well the new architecture scales for different numbers of SIMD co-processors. The experimental results show that the ePUMA architecture's memory subsystem can effectively hide the data access overhead. With its 8-way SIMD data path and multi-SIMD parallel execution, the ePUMA architecture improves the performance of matrix multiplication with a speedup of 45x from the conventional SIMD extension.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On The Design of High Perfprmance Reconfigurable DSP processor using FPGA

In this paper, a high performance reconfigurable combined architecture of Discrete Wavelet Transform (DWT), Matrix Multiplication and Fast Fourier Transform is presented. This reduces area and become cost-effective. In the proposed DWT architecture the input data are separated as even and odd numbers of data as well as both data are inputted parallel. This cause faster DWT operation then conven...

متن کامل

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications

Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...

متن کامل

Implementation of Sparse Matrix Arithmetic on a DSP Processor

The paper presents a method for sparse matrix multiplication on a DSP processor. Its high efficiency is a consequence of the proposed pseudo-random data memory access and parallelism of the multifunctional instructions of a DSP. Sparse matrix multiplication is implemented as linear expanded DSP code automatically generated by specially designed program. The method is applied to predictive vecto...

متن کامل

BREAKING NEW GROUNDS OVER 3000 M MAC/s: A BROADBAND MOBILE MULTIMEDIA MODEM DSP

Future DSP architectures need to be developed for paving the way into a generation of high scale signal processing power, i.e. greater than 1000M MAC/s (multiply accumulate per second). This can only be reached today by introducing a new parallel processing paradigm into DSP architecture. Here we show that this requirement can be achieved already today at moderate clock speeds (100MHz) based on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009